Spelling Corrector www.aeyec.com Nelson Ford March 25, 2013 This is the spelling corrector subroutines extracted from the AI-C Lookup program. There is no copyright on this. I've tried to add plenty of description of the code. The code works with the AI-C databases which can be downloaded from the link above. In particular, each record in the Words table of AI-C includes a modified Soundex code created with the first Sub below. Public Function Soundex(ByVal WordStr As String) As String Dim str_Word As String Dim i As Long Dim c As String Dim s As String Dim d As String Dim f As String Dim let1 As String Dim wrd As String Dim metaph As String ' 1. Encode the letters, starting with the 2nd letter. ' ' 2. If two adjacent letters have the same soundex code, ' ' treat them as one. ' ' 3. If two consonants are separated by a vowel or Y, use ' ' both consonants. ' ' 4. If two consonants are separated by H or W, treat ' ' them as one. (I.e. rule 3 applies.) ' str_Word = UCase$(Trim$(WordStr)) If str_Word = "" Then Exit Function ' Get rid of non-alpha characters: ' ' (e.g.: o'clock -> oclock) ' For i = 1 To Len(str_Word) c = Mid(str_Word, i, 1) f = c If InStr("ÀÁÂÃÄÅ«", c) > 0 Then f = "A" ElseIf c = "Ç" Then f = "C" ElseIf InStr("ÈÉÊË", c) > 0 Then f = "E" ElseIf InStr("ÌÍÎÏ", c) > 0 Then f = "I" ElseIf c = "Ñ" Then f = "N" ElseIf InStr("ÒÓÔÕÖ", c) > 0 Then f = "O" ElseIf InStr("ÙÚÛÜ", c) > 0 Then f = "U" ElseIf c = "Ý" Then f = "Y" End If If c <> f Then Mid$(str_Word, i, 1) = f If (Not (c Like "[A-Z]")) Then str_Word = Replace(str_Word, c, " ") End If Next i str_Word = Replace$(str_Word, " ", "") If str_Word = "" Then Exit Function ' Change starting letters to actual sounds: ' s = Left$(str_Word, 2) If s = "PS" Or s = "PN" Or s = "KN" Or s = "GN" Or s = "WR" Then str_Word = Mid(str_Word, 2) ElseIf Left$(str_Word, 3) = "WHO" Then str_Word = "H" & Mid$(str_Word, 3) ElseIf s = "WH" Then str_Word = "W" & Mid$(str_Word, 3) ElseIf s = "PH" Then str_Word = "F" & Mid$(str_Word, 3) ElseIf Left$(str_Word, 1) = "X" Then str_Word = "Z" & Mid$(str_Word, 2) End If ' Metaphone changes: ' ' When swapping letters for numbers, "c" ' ' is treated as a sibilant, not as a "k".' str_Word = Replace$(str_Word, "STLE", "SEL") ' whistle, castle, wrestle, gristle - the t is silent ' str_Word = Replace$(str_Word, "SCLE", "SEL") ' muscle - the c is silent ' str_Word = Replace$(str_Word, "CK", "K") ' rack, flock, trick, check - the c is silent ' str_Word = Replace$(str_Word, "CT", "KT") ' doctor, reactor, practice - the c is hard ' str_Word = Replace$(str_Word, "SCIE", "SIE") ' science - the c is silent ' str_Word = Replace$(str_Word, "SCE", "SE") ' scene - the c is silent ' str_Word = Replace$(str_Word, "SCY", "SY") str_Word = Replace$(str_Word, "SC", "SK") ' scary, disco, rascal - the c is hard ' str_Word = Replace$(str_Word, "DGE", "J") ' edge ' str_Word = Replace$(str_Word, "DGY", "JE") ' edgy ' str_Word = Replace$(str_Word, "DGI", "JI") ' edginess ' str_Word = Replace$(str_Word, "TIA", "SHA") str_Word = Replace$(str_Word, "TIO", "SHO") str_Word = Replace$(str_Word, "TCH", "CH") If Right$(str_Word, 2) = "GN" Then str_Word = Left$(str_Word, Len(str_Word) - 2) & "N" ' sign ' If Right$(str_Word, 4) = "GNED" Then str_Word = Left$(str_Word, Len(str_Word) - 4) & "ND" If Right$(str_Word, 5) = "GNING" Then str_Word = Left$(str_Word, Len(str_Word) - 5) & "NING" If Right$(str_Word, 2) = "IC" Then str_Word = Left$(str_Word, Len(str_Word) - 2) & "IK" ' rustic, fantastic ' i = InStr(str_Word, "W") Do While i > 0 If InStr("AEIOU", Mid$(str_Word & " ", i + 1, 1)) = 0 Then ' W not followed by a vowel is silent. ' If i > 1 Then str_Word = RTrim$(Left$(str_Word, i - 1) & Mid$(str_Word & " ", i + 1)) Else str_Word = Mid$(str_Word, 2) End If End If i = InStr(i + 1, str_Word, "W") Loop ' Metaphone says that G is silent in GH, ' ' but the GH becomes F in LAUGHTER. ' ' Change letters to codes, ' ' starting with 2nd letter: ' For i = 2 To Len(str_Word) d = Mid$(str_Word, i, 1) If InStr("AEIOUHWY", d) > 0 Then d = "0" ' zero, not uppercase "o". ' ElseIf InStr("BFPV", d) > 0 Then d = "1" ElseIf InStr("GJKQ", d) > 0 Then d = "2" ElseIf InStr("CSXZ", d) > 0 Then ' Split these from last line because ' ' GJKQ are hard sounds and CSXZ are ' ' sibilants ("s" sounding), other ' ' that the adjustments to C combos ' ' made above. ' d = "7" ElseIf InStr("DT", d) > 0 Then d = "3" ElseIf InStr("L", d) > 0 Then d = "4" ElseIf InStr("MN", d) > 0 Then d = "5" ElseIf InStr("R", d) > 0 Then d = "6" Else If d <> "-" Then Stop d = "" End If Mid(str_Word, i, 1) = d Next wrd = "" ' Remove repeating codes: 50773 -> 5073 ' For i = 1 To Len(str_Word) If Mid(str_Word, i, 1) <> _ Mid(str_Word & " ", i + 1, 1) _ Then wrd = wrd + Mid(str_Word, i, 1) End If Next If str_Word = "" Then Exit Function ' Get rid of vowels (which have been replaced by 0's): ' str_Word = Replace(wrd, "0", "") str_Word = Left$(str_Word & "0000", 4) Soundex = str_Word End Function -------------------------------------------------------------------------------------- As previously noted, the Soundex code is stored in the Words table with each word so the actual spelling corrector starts here. -------------------------------------------------------------------------------------- Public Sub GetSuggestions( _ ByVal str_Word As String, _ ByVal sndx As String) Dim n As Integer Dim l As Integer Dim best As Integer Dim lev As Integer Dim i As Integer Dim j As Integer Dim k As Integer Dim tst1 As Integer Dim tst2 As Integer Dim tst3 As Integer Dim tst4 As Integer Dim sndx2 As String Dim w As String Dim w0 As String Dim w1 As String Dim w2 As String Dim w3 As String Dim w4 As String Dim cons As String Dim cons2 As String Dim c1 As String Dim c2 As String Dim keys(4) As String Dim row As Long Dim col As Long Dim adjacent As Boolean Dim got1 As Boolean Dim wID As Long Dim q As Long Dim q1 As Long Dim q2 As Long Dim q3 As Long Dim sug(5000) As String Dim capWord As String Dim tempWord As String Dim z As Long Dim runOnWords() As String Dim s As String Dim changingSndx As Boolean Dim originalSndx As String If Len(str_Word) < 2 Then Exit Sub ' Display the Suggestions box, "Use highlighted word" button and "Close list" button. lstSuggestions.Clear lstSuggestions.Height = lb_Links.Height - b_CloseSuggestions.Height lstSuggestions.Visible = True b_UseSuggestion.Visible = True b_CloseSuggestions.Visible = True CheckSndx: ' Convert the word to just consonants ' ' except for the first letter. ' s = LCase$(Left$(str_Word, 1)) cons = Replace$(LCase$(str_Word), "a", "") cons = Replace$(LCase$(cons), "e", "") cons = Replace$(LCase$(cons), "i", "") cons = Replace$(LCase$(cons), "o", "") cons = Replace$(LCase$(cons), "u", "") If InStr("aeiou", s) > 0 Then cons = s & cons With WordsRS ' WordsRS is the table which stores the Words and their Soundex codes. ' ' Swap out each letter and look for words: ' .Index = "Text" k = Len(str_Word) For i = 1 To k ' For each letter in the word. ' w2 = str_Word c1 = Mid$(w2, i, 1) ' c1 = a letter ' z = Asc(c1) ' z is the ASCII code for the letter ' For j = 97 To 122 ' a to z ' If j <> z Then ' If letter "j" (a-z) is not "z" (the "i"th letter of the word) ' ' then swap letter "j" into the word and see if it is in the Words table: ' Mid$(w2, i, 1) = Chr$(j) .Seek "=", w2 If Not .NoMatch Then ' Because searches are not case sensitive, ' ' this will get matches regardless of case.' ' Swapping a letter in the word entered by the user ' ' resulted in a match in the Words table so add it ' ' to the Suggestions list: ' lstSuggestions.AddItem !Text ' WordsRS!Text is the word found which is also "w2". End If End If Next Next ' The above swap will find a match with one single-quote, ' ' such as 'em, but not more than 1, such as gen'l'men. ' ' Testing every possible substitution for 2 single-quotes ' ' would be a total of 625 tests, which we can bring down: ' q1 = InStr(str_Word, "'") q2 = InStr(q1 + 2, str_Word, "'") q3 = InStr(q2 + 2, str_Word, "'") If q1 > 1 And q2 > 0 Then ' At least two single-quotes. ' ' Unlikely to have more than 3. ' First quote must come after ' ' the first characters, else ' ' we have to check every word.' w2 = Left$(str_Word, q1 - 1) ' w2= letters before the first quote mark. ' .Seek ">=", w2 ' Words starting with the same letters. ' If Not .NoMatch Then ' Match(es) found. z = Len(w2) q = Len(str_Word) Do Do While q <> Len(!Text) And w2 = LCase$(Left$(!Text, z)) .MoveNext If .EOF Then Stop: Exit Sub ' Shouldn't happen ' Loop If w2 <> Left$(!Text, z) Then ' No more words starting with the specified letters. ' Exit Do End If ' Substitute quote marks in found word: ' tempWord = !Text Mid$(tempWord, q1) = "'" Mid$(tempWord, q2) = "'" If q3 > 0 Then Mid$(tempWord, q3) = "'" If tempWord = str_Word Then lstSuggestions.AddItem !Text End If .MoveNext Loop End If End If w = t_Word MatchSndx: ' Get words with same Soundex code as target word. ' .Index = "Soundex" .Seek "=", sndx ' Target word's SOundex code. ' If Not .NoMatch Then ' CortexRS is where words are linked. ' ' See www.aeyec.com for explanations. ' CortexRS.Index = "WordID" CortexRS.Seek "=", !id If Not CortexRS.NoMatch Then If CortexRS!linkID = 31015 Or CortexRS!linkID = 31040 Then ' Word entered is misspelled (LinkID# 31015 or 31040. ' ' StartID should contain the cID entry for the correct word.' If Not IsNull(CortexRS!startID) Then wID = CortexRS!startID CortexRS.Index = "ID" CortexRS.Seek "=", wID If CortexRS.NoMatch Then Stop If IsNull(CortexRS!WordID) Then Stop Else .Index = "ID" wID = CortexRS!WordID .Seek "=", wID If .NoMatch Then Stop For i = 1 To 6: lstSuggestions.AddItem " ": Next lstSuggestions.AddItem !Text, 0 lstSuggestions.AddItem "Linked in Cortex as the correct spelling of", 2 lstSuggestions.AddItem " " & t_Word, 3 End If Exit Sub End If End If End If ' Add words to the Suggestions list which have the same Soundex code: ' Do While !Soundex = sndx lstSuggestions.AddItem !Text .MoveNext If .NoMatch Then Exit Do End If Loop End If ' Some additional adjustments: ' ' "c" is often left off of words like "acquaint". ' ' The following will pick up the "acq" words: ' If Left$(w, 2) = "aq" Then w = "acq" & Mid$(w, 3) sndx = Soundex(w) ' Calculate the Soundex code for the modified word ' GoTo MatchSndx ' and repeat the above steps. End If ' Look at code to see if digits are reversed. ' ' Example: silibant = S415, sibilant = S145 ' ' Swap first two digits (e.g.: 41 -> 14): ' w2 = sndx c2 = Mid$(w2, 2, 1) Mid$(w2, 2, 2) = Mid$(w2, 3, 1) & c2 .Seek ">=", w2 If Not .NoMatch Then Do While Left$(!Soundex, Len(w2)) = w2 lstSuggestions.AddItem !Text .MoveNext If .EOF Then Exit Do Loop End If ' Look for swapped consonants by swapping ' ' 2nd-3rd digits (e.g.: 15 -> 51) of Soundex: ' w2 = sndx c2 = Mid$(w2, 3, 1) If Not (Mid$(w2, 4, 1) = "0" And c2 = "0") Then ' Don't swap one 0 for another. ' Mid$(w2, 3, 2) = Mid$(w2, 4, 1) & c2 .Seek ">=", w2 If Not .NoMatch Then Do While !Soundex = w2 lstSuggestions.AddItem !Text .MoveNext If .EOF Then Exit Do Loop End If End If ' Search for some other words ' ' not indicated by Soundex: ' ' See if text is a run-on word ' ' using the Subs after this one. ' If Len(str_Word) > 3 Then int_Nested = 0 FindRunOns str_Word, "" FindAffixes str_Word End If ' Typos: ' ' There are three kinds: ' ' 1. hit key adjacent AND the target key. ' ' Example: phjone -> phone ' ' (Could be same key twice: phoone) ' ' 2. hit an adjacent key instead of target. ' ' Example: ohone -> phone ' ' 3. left out the intended letter. ' ' Example: phne -> phone ' ' ' ' For each letter in the submitted word, ' ' check to see if the next letter is adja- ' ' cent to it. If so, assume that it is a ' ' Type 1 error (e.g.: "j" is adjacent to "h") ' Take out the current letter (j) and search' ' for a match in the dictionary. Note that ' ' the target letter could have been hit ' ' twice, such as "phoone", so test it too. ' ' ' ' If the current letter is NOT adjacent to ' ' the next letter, assume it is a Type 2 ' ' error (e.g.: "o" is not adjacent to "h"). ' ' Substitute for the current letter ("o") ' ' each of the letters around that letter ' ' (e.g.: i, k, l, p) and check for words in ' ' the dictionary. ' ' ' ' Note that we have to test each letter of ' ' the submitted word. Even if we find a good' ' word, there may be another/more error(s). ' keys(0) = " " keys(1) = " qwertyuiop " keys(2) = " asdfghjkl '" keys(3) = " zxcvbnm " keys(4) = " " ' Check for dropped first letter: ' .Index = "Text" For j = 97 To 122 ' a - z ' w2 = Chr$(j) & str_Word .Seek "=", w2 If Not .NoMatch Then lstSuggestions.AddItem w2 got1 = True End If Next j = Len(str_Word) For i = 1 To j c1 = LCase$(Mid$(str_Word, i, 1)) c2 = LCase$(Mid$(str_Word, i + 1, 1)) If c2 = "'" And Right$(" " & str_Word, 3) = "in'" And i = j Then ' e.g.: runnin' ' w2 = ChopRight(str_Word, 1) & "g" ' e.g.: running ' .Seek "=", w2 If Not .NoMatch Then lstSuggestions.AddItem w2 got1 = True End If End If If c2 = "-" Then Exit For If c2 > "z" Then Stop: Exit For col = InStr(keys(1), c2) ' keys() was set up above. ' If col > 0 Then row = 1 Else col = InStr(keys(2), c2) If col > 0 Then row = 2 Else col = InStr(keys(3), c2) row = 3 End If End If ' Is first char adjacent to the 2nd char? ' ' Example: "phjone" ' adjacent = False For j = -1 To 1 ' row ' For k = 0 To 2 ' col ' If Mid$(keys(row + j), col + k, 1) = c1 Then ' First char is adjacent to 2nd. ' ' If this is a typo, c1 is likely ' ' an added character. Try leaving ' ' it out and look for the word: ' adjacent = True End If ' Do this for each letter in the word, ' ' not just adjacent ones: ' ' (It doesn't take noticeably longer.) ' w2 = Left$(str_Word, i - 1) & Mid$(str_Word, i + 1) .Seek "=", w2 If Not .NoMatch Then Do While !Text <> w2 And LCase$(!Text) = LCase$(w2) .MoveNext If .EOF Then Exit Do Loop If !Text = w2 Then lstSuggestions.AddItem w2 got1 = True End If End If Next Next If Not adjacent Then ' Not adjacent, so it is more likely ' ' a substitution of a letter than an ' ' addition of a letter. Try swapping ' ' the letters adjacent to c1. ' ' Example: ohone swap i,k,l,p for o ' col = InStr(keys(1), c1) If col > 0 Then row = 1 Else col = InStr(keys(2), c1) If col > 0 Then row = 2 Else col = InStr(keys(3), c1) row = 3 End If End If For j = -1 To 1 If row + j = 0 Then j = 0 ' row 0 is blank ' For k = -1 To 1 w2 = str_Word c2 = Mid$(keys(row + j), col + k, 1) If c2 <> " " Then Mid$(w2, i, 1) = Mid$(keys(row + j), col + k, 1) .Seek "=", w2 If Not .NoMatch Then Do While !Text <> w2 And LCase$(!Text) = LCase$(w2) .MoveNext If .EOF Then Exit Do Loop If !Text = w2 Then lstSuggestions.AddItem w2 got1 = True End If End If End If Next If row + j = 3 Then Exit For ' row 4 is blank ' Next End If ' Swap c1 and c2 and look for a word. ' ' Example: ocme -> come ' w2 = str_Word Mid$(w2, i, 2) = Mid$(w2, i + 1, 1) & Mid$(w2, i, 1) .Seek "=", w2 If Not .NoMatch Then Do While !Text <> w2 And LCase$(!Text) = LCase$(w2) .MoveNext If .EOF Then Exit Do Loop If !Text = w2 Then lstSuggestions.AddItem w2 got1 = True End If End If ' Look for dropped letters: ' ' "wite" was not showing "write" ' ' "nventory" -- "inventory", ' ' "icluding" -- "including", etc. ' If i < Len(str_Word) Then w3 = str_Word If Mid$(str_Word, i, 1) = "'" Then w3 = Left$(w3, i - 1) & Mid$(w3, i + 1) End If ' The above changes 'bout to just bout ' ' then the following routine checks for ' ' the missing letter. Likewise for words' ' like acc'racy. ' For j = 97 To 122 ' a - z ' w2 = Left$(w3, i) & Chr$(j) & Mid$(w3, i + 1) .Seek "=", w2 If .NoMatch Then If i = 1 Then w2 = Chr$(j) & w3 Else w2 = Left$(w3, i - 1) & Chr$(j) & Mid$(w3, i) End If .Seek "=", w2 End If If Not .NoMatch Then Do While !Text <> w2 And LCase$(!Text) = LCase$(w2) .MoveNext If .EOF Then Exit Do Loop If !Text = w2 Then lstSuggestions.AddItem w2 got1 = True End If End If Next End If Next ' Look for dropped letter at end of word: ' ' Not sure if this is needed, but doesn't take long. ' For j = 97 To 122 ' a - z ' w2 = str_Word & Chr$(j) .Seek "=", w2 If Not .NoMatch Then Do While !Text <> w2 And LCase$(!Text) = LCase$(w2) .MoveNext If .EOF Then Exit Do Loop If !Text = w2 Then lstSuggestions.AddItem w2 got1 = True End If End If Next ' Check input starting with non ' ' and followed by good words. ' ' ex: nontoxic - look for toxic ' If Left$(str_Word & " ", 3) = "non" Then w2 = Mid$(str_Word, 4) .Index = "Text" .Seek "=", w2 If Not .NoMatch Then lstSuggestions.AddItem "non" & w2 End If End If ' The syllable -tu- is pronounced "chEW" ' ' (taran-tu-la, even-tu-al) which may ' ' lead people to spell the word that way ' ' (taran-chu-la). So try swapping -chu- ' ' for -tu- and look for a word. (This ' ' is off-base for words where chu is not ' ' a stand-alone syllable, but it doesn't ' ' cost much time to try it.) ' ' Others: ritual (rich-EW-EHl) ' ' actual (ak'chEW-EHl) ' ' punctuation (pungk'chEW-AE'shEHn) ' i = InStr(str_Word, "chu") If i > 0 Then w2 = Replace$(str_Word, "chu", "tu") .Index = "Text" .Seek "=", w2 If Not .NoMatch Then lstSuggestions.AddItem w2 End If End If If InStr(str_Word, "-n-") > 0 Then ' May need to do this for 'n' as well. ' ' cut-n-paste = cut-and-paste ' w2 = Replace$(str_Word, "-n-", "-and-") .Index = "Text" .Seek "=", w2 If Not .NoMatch Then lstSuggestions.AddItem w2 End If End If End With If originalSndx <> "" Then ' See code further down which ' ' sets "originalSndx". ' i = Val(Right$(sndx, 1)) j = Val(Right$(originalSndx, 1)) i = i + 1 If i = j Then i = i + 1 If i < 10 Then sndx = ChopRight(sndx, 1) & i GoTo CheckSndx End If End If '''''''''''''''''''''''''''''''''''''''''' ' We now have a list of suggestions. ' ' Filter them to get reasonable matches: ' '''''''''''''''''''''''''''''''''''''''''' With lstSuggestions If .ListCount > 0 Then ReDim scores(1, .ListCount - 1) As Integer ReDim hiQ(1) As Integer i = 0 ' Remove duplicates: (The list was automatically sorted.) ' Do While i < .ListCount - 1 Do While .List(i) = .List(i + 1) .RemoveItem i Loop i = i + 1 Loop If .ListCount = 0 Then GoTo WikiList End If i = 0 Do While i < .ListCount ' In VB, .ListCount is one more than the ' list index because the index starts with 0. w3 = .List(i) ' w3 is the next word on the list. ' Do While InStr(.List(i), " ") > 0 ' Skip text with spaces, which indicates a special entry. ' i = i + 1 If i = .ListCount Then Exit Do ' exit this Do Loop If i = .ListCount Then Exit Do ' exit the previous Do 10 lines up. If str_Word <> w3 Then ' str_Word is the word entered by the user. If Abs(Len(str_Word) - Len(w3)) > 2 Then scores(0, i) = 0 scores(1, i) = 0 Else ' See Sub qGram and Sub nGram after this sub. ' scores(0, i) = qGram(str_Word, w3) _ - Abs(Len(str_Word) _ - Len(w2)) scores(1, i) = nGram(str_Word, w3) End If For j = 0 To 1 If scores(j, i) > hiQ(j) Then ' Keep track of high score: ' hiQ(j) = scores(j, i) End If Next End If i = i + 1 Loop n = .ListCount - 1 For i = n To 0 Step -1 sug(i) = .List(i) Do While InStr(.List(i), " ") > 0 ' Keep suggestions with spaces because they are probably special cases. ' sug(i) = .List(i) i = i - 1 If i < 0 Then Exit For Loop If str_Word <> sug(i) Then k = LevDist(str_Word, sug(i)) ' Wasn't getting "availability" for ' ' "avaiblity" without LevDist. ' ' See Sub LevDist after this Sub. ' If k < 5 Or (sug(i) <> "zzzz" And scores(0, i) > 0 And _ (scores(1, i) > 0 Or _ Len(str_Word) < 4)) _ Then tst1 = Len(str_Word) > 4 And _ hiQ(0) - scores(0, i) > 2 tst2 = Len(str_Word) > 2 And _ Len(str_Word) < 5 tst3 = scores(0, i) < 2 tst4 = hiQ(1) - scores(1, i) > 1 And _ Len(str_Word) - hiQ(1) < 4 w = Left$(sug(i), 1) cons2 = Replace$(sug(i), "a", "") If Right$(cons2, 2) = "gn" Then cons2 = Left$(cons2, Len(cons2) - 2) & "n" cons2 = Replace$(cons2, "e", "") cons2 = Replace$(cons2, "i", "") cons2 = Replace$(cons2, "o", "") cons2 = Replace$(cons2, "u", "") If InStr("aeiou", w) > 0 Then cons2 = w & cons2 If (tst1 And tst2 And tst3) Or _ tst4 Or _ Abs(Len(str_Word) - Len(w3)) > 2 _ Then If LCase$(cons) <> LCase$(cons2) And k > 3 Then ' This was removing "alright" from "all right" ' ' so I put in the LevDist check. ' sug(i) = "zzzz" ' Deleted suggestion . End If End If ElseIf i < .ListCount And _ str_Word <> Replace$(sug(i), "-", "") And _ str_Word <> Replace$(sug(i), "'", "") _ Then sug(i) = "zzzz" ' Deleted suggestion . Else Stop End If End If Next End If ' Redo suggestion list leaving out deleted suggestions: ' .Clear ' Adding good words on 3rd line down: ' j = 0 Do If sug(j) <> "zzzz" Then .AddItem sug(j) j = j + 1 Loop While sug(j) <> "" Or j < n ' Remove blank lines: ' Do While .List(0) = "" And .ListCount > 0 .RemoveItem 0 Loop ' Move the top suggestion to the top of the list: ' If .ListCount > 0 Then ' Find word in list of about the same ' ' length with the most matching letters: lev = 999 n = 0 Do w3 = .List(n) .RemoveItem n If (Abs(Len(str_Word) - Len(w3)) < 4 Or InStr(w3, " ") > 0) And _ Left$(w3, 2) <> " P" _ Then If InStr(w3, " ") > 0 And Left$(w3, 1) > "9" Then ' Split up a run-on word. ' ' Make it go after after ' ' non-split words with 1. ' .AddItem "2 - " & w3 n = n + 1 Else If str_Word = w3 Then l = 0 Else l = LevDist(str_Word, w3) End If If (l < 3 Or _ (l = 3 And _ Len(str_Word) > 5)) And _ w3 <> " " _ Then ' See if the suggested word is in the Cortex. ' ' Not all words in the Words table are in the ' ' Cortex for various reasons. ' FindWordID w3 ' sets long_WordID ' FindWordIDEntry w3, long_WordID ' sets long_CortexID ' If long_CortexID = 0 Then w3 = w3 & " (Not in Cortex.)" End If .AddItem l & " - " & w3 n = n + 1 If l < lev Then lev = l End If End If End If Loop While n < .ListCount End If End With WikiList: ' Look in Wikipedia's list of common misspellings: ' ' (The Wiki table is in the Words database.) ' With WikiSpellRS .Index = "Wrong" .Seek "=", str_Word If Not .NoMatch Then Do w3 = !Right lstSuggestions.AddItem "0 - " & w3 & " (WikiSpell table)", 0 .MoveNext Loop While !Wrong = w3 End If End With ' I wasn't getting "repertoire" ' ' as a suggestion for "repetoir" ' ' because 2 letters are missing. ' ' This is a heavy-handed attempt ' ' to find such a match, but the ' ' results are still virtually ' ' instantaneous. ' With WordsRS If Len(str_Word) > 5 And Val(lstSuggestions.List(0)) > 1 Then ' This is an arbitrary range, ' ' but I think it is probably ' ' needed more for longer words. ' Skip this if we already have' ' a 1 on the list. ' w2 = Left$(str_Word, 3) .Seek ">=", w2 If Not .NoMatch Then Do i = LevDist(str_Word, !Text) If i < 3 And InStr(!Text, " ") = 0 Then ' arbitrary ' lstSuggestions.AddItem i & " - " & !Text End If .MoveNext If .EOF Then Exit Do Loop While w2 = Left$(LCase$(!Text), 3) ' Remove duplicates: ' i = 0 Do While i < lstSuggestions.ListCount - 1 Do While Replace$(lstSuggestions.List(i), " (Not in Cortex.)", "") = Replace$(lstSuggestions.List(i + 1), " (Not in Cortex.)", "") lstSuggestions.RemoveItem i Loop i = i + 1 Loop End If End If End With ' Add word frequency ranking to the list: ' n = lstSuggestions.ListCount - 1 sug(0) = "" With WordsRS 'WordFreqRS WordsRS.Index = "Text" For i = 0 To n sug(i) = lstSuggestions.List(i) j = Val(sug(i)) w3 = Mid$(sug(i), 5) z = InStr(w3, " (") If z > 0 Or Left$(w3, 1) = "0" Then s = Mid$(w3, z) w3 = Left$(w3, z - 1) Else s = "" End If If InStr(w3, " ") = 0 Then .Seek "=", w3 If Not WordsRS.NoMatch Then If WordsRS!Text <> w3 Then WordsRS.MoveNext k = WordsRS!Text <> w3 Else k = 0 End If If WordsRS.NoMatch Or k Or IsNull(WordsRS!FreqUsed) Then sug(i) = j & " (z) - " & w3 ElseIf WordsRS!FreqUsed < "4" Or str_Word = w3 Or Val(sug(i)) = 1 Then sug(i) = j & " (" & WordsRS!FreqUsed & ") - " & w3 Else sug(i) = "" End If If sug(i) <> "" Then sug(i) = sug(i) & s Else ' Run-on word splits: ' sug(i) = j & " (s) - " & w3 End If Next End With ' Show the completed list: ' With lstSuggestions .Clear .AddItem " Possible corrections for '" & t_Word & "':" .AddItem "z" ' See note below about adding a line without sorting. ' .List(1) = " " k = Val(sug(0)) For i = 0 To n If sug(i) <> "" Then If InStr(sug(i), "- =") > 0 Then ' Indicates the correct form of a made-up input word. ' sug(i) = "0.0.0 (0) - " & Mid$(sug(i), 11) z = 0 ElseIf Len(cons) > 2 Then ' For the words with the smallest LevDist, ' ' the following compares the consonants of ' ' the suggested word to the consonants of ' ' the input to see suggested word has the ' ' smallest LevDist and adds it as ".n" to ' ' the line. Example: for "flowerey", ' ' floweret and flowery both = 1 change, ' ' but flowery has the same consonants. ' ' Don't want to move up run-on words ' ' (which have a space in them). ' j = InStr(sug(i), "-") cons2 = Mid$(sug(i), j + 2) s = Left$(cons2, 1) j = InStr(cons2, "(") If j > 0 Then cons2 = Left$(cons2, j - 2) q2 = Abs(Len(cons2) - Len(t_Word)) cons2 = Replace$(cons2, "a", "") If Right$(cons2, 2) = "gn" Then cons2 = Left$(cons2, Len(cons2) - 2) & "n" cons2 = Replace$(cons2, "e", "") cons2 = Replace$(cons2, "i", "") cons2 = Replace$(cons2, "o", "") cons2 = Replace$(cons2, "u", "") If InStr("aeiou", s) > 0 Then cons2 = s & cons2 z = LevDist(cons2, cons) sug(i) = Left$(sug(i), 1) & "." & z & "." & q2 & Mid$(sug(i), 2) Else sug(i) = Left$(sug(i), 1) & ".0.0" & Mid$(sug(i), 2) End If If z < 10 Then .AddItem sug(i) End If End If Next If .ListCount = 2 Then ' No matches found but not ready to give up. ' ' For example: sucssuful (S714) does not ' ' return successful (S711), but Word gets it.' ' Also, questional (Q574) does not return ' ' questionable (Q571). Word also gets that. ' ' Below is the start of changing the last ' ' digit of the soundex code and retrying. ' ' A block of code further up causes the other' ' digits to be checked. This won't get every-' ' thing, but it nails "successful" and gets ' ' "questionable" into the list, though not ' ' at the top of the list. ' If originalSndx = "" Then originalSndx = sndx If Right$(sndx, 1) = "0" Then sndx = ChopRight(sndx, 1) & "1" Else sndx = ChopRight(sndx, 1) & "0" End If GoTo CheckSndx Else ' No suggestions found. Try getting ' ' pronunciation of word based on its ' ' spelling and compare that to the ' ' NoSyls field in the Pronunciation ' ' table. Example "dere" - "there". ' ' Also search for word in context ' ' such as "would you pease" versus ' ' "would you please". ' 'CompareProunciation - This Sub not written yet. ' .List(2) = " - No suggestions found." Exit Sub End If End If For i = 2 To .ListCount - 1 w3 = .List(i) w3 = Replace$(w3, "(z", "(_") .List(i) = w3 Next ' Add standard stuff to bottom of the list. ' ' .AddItem causes lines to be inserted ' ' alphabetically while .List(n)= just ' ' changes line "n" without moving it into ' ' alphabetical order, so we force these ' ' lines to the bottom by first adding "z" ' ' and then changing it to the desired text: ' .AddItem "z" .List(.ListCount - 1) = " " .AddItem "z" .List(.ListCount - 1) = "1st digit = number of differences." .AddItem "z" .List(.ListCount - 1) = "2nd digit = differences in consonants." .AddItem "z" .List(.ListCount - 1) = "3rd digit = differences in length." .AddItem "z" .List(.ListCount - 1) = "(#) = usage frequency rank." .AddItem "z" .List(.ListCount - 1) = "Double-click a word to rank its frequency." .ListIndex = 2 End With End Sub Public Sub FindRunOns(runOnWords As String, splitWords As String) ' Take text like "fillitup" and split it into "fill it up". ' ' The code works by starting from the end of the word to ' ' look for embedded words. Dim i As Long Dim j As Long Dim k As Long Dim rank As Long Dim splitPoint As Long Dim wrdLen As Long Dim runonWrds As String Dim splitoffWrd As String ' split-off word ' Dim s As String Dim x As String Dim leftSide As String Dim rightSide As String WordsRS.Index = "Text" CortexRS.Index = "WordID" runonWrds = runOnWords wrdLen = Len(runonWrds) splitPoint = wrdLen - 1 ' if left part <> word(s) then reject splitoffWrd ' If Right$(" " & runonWrds, 5) = "yness" Then ' "ness" is a word, so misspellings ending ' ' in "yness" are splitting on it, but it is ' ' not a common word and "yness" is almost ' ' always "iness" so I'm working around it. ' splitPoint = wrdLen - 6 End If Do splitoffWrd = Mid$(runonWrds, splitPoint) ' Skip past endings which are not words ' ' (usually, though "el" is a train). ' If splitoffWrd = "ing" Or splitoffWrd = "ed" Or _ splitoffWrd = "et" Or splitoffWrd = "es" Or splitoffWrd = "ity" Or _ (splitoffWrd = "el" And splitPoint > 2) _ Then splitPoint = splitPoint - 1 If splitPoint = 0 Then Exit Sub splitoffWrd = Mid$(runOnWords, splitPoint) End If If splitPoint > 1 Then leftSide = Left$(runonWrds, splitPoint - 1) ' Look in the Words table: ' WordsRS.Seek "=", splitoffWrd ' right part of (remaining part of) word ' If Not WordsRS.NoMatch Then ' Access thinks "Apple" = "apple" but ' ' we don't, so make sure case matches: ' Do While splitoffWrd <> WordsRS!Text And _ LCase$(splitoffWrd) = LCase$(WordsRS!Text) WordsRS.MoveNext If WordsRS.EOF Then Exit Do End If Loop ' Virtually all "normal" words have been added to the Cortex with their Part of Speech. ' ' Make sure the POS entry is a regular POS (noun, verb, etc.): ' If splitoffWrd = WordsRS!Text Then CortexRS.Seek "=", WordsRS!id If CortexRS.NoMatch Then WordsRS.Seek "=", ";;;" ' Force a "WordsRS.NoMatch" for below. ' Else If CortexRS!linkID = 30230 Then Exit Sub ' Suffix should be handled in FindAffix sub. Do While Not (CortexRS!linkID < 30130 Or _ CortexRS!linkID = 30930) And _ CortexRS!WordID = WordsRS!id ' 30000-30130 are common POS's; 30930=contraction. ' ' This screens out chemical symbols, acronyms, abbreviations, ' ' prefixes, suffixes, letters, symbols, misspelled words, etc. ' ' Change this code if you want any of these included in run-on's. ' CortexRS.MoveNext If CortexRS.EOF Then Exit Do End If Loop If Not (CortexRS!linkID < 30130 Or _ CortexRS!linkID = 30930) Or _ CortexRS!WordID <> WordsRS!id _ Then WordsRS.Seek "=", ";;;" ' ' Force a "WordsRS.NoMatch" for below. ' End If End If Else ' Case didn't match. Force a "Words.NoMatch" for next section. ' WordsRS.Seek "=", ";;;" End If End If ' Check to see if it is a common word: ' rank = 99999 If Not WordsRS.NoMatch Then WordsRS.Seek "=", splitoffWrd If Not WordsRS.NoMatch Then If splitoffWrd <> WordsRS!Text Then ' case doesn't match ' WordsRS.MoveNext End If If splitoffWrd = WordsRS!Text Then rank = Val("" & WordsRS!FreqUsed) ' Frequency Used ranks are: ' 1 = common ' 2 = less common ' 3 = rarely seen ' 4 = jargon, Names ' 5 = taboo/vulgar ' 6 = archaic End If End If ' Right part of word is a good word, ' ' but use it only if the left part ' ' is also one or more good words: ' ' Example: "grandfatherclock" ' ' First word found is "lock", but ' ' "grandfatherc" cannot be used. ' ' 2nd word found is "clock", which ' ' leaves "grandfather" on the left, ' ' so "clock" can be used. ' ' Check for run-ons in "grandfather" ' ' by recursively calling this routine. ' grandfat her ' ' grandf at her ' ' grand fat her (list it) ' ' Once a good split is found, keep ' ' looking from the word to the left ' ' of the first word found in this ' ' loop ("fat", in this example). ' ' However, "grand" doesn't split, so ' ' I won't bother to show it here. ' ' Now we return to "fat" and see if ' ' we can make any more words with it ' ' at the end of a new word: ' ' gran dfat ' ' gra ndfat ' ' gr andfat ' ' g randfat ' ' grandfat ' ' No new words were found with fat ' ' at the end, so we reject "her" and ' ' try to make a new word with "her" ' ' at the end like we did with fat. ' ' grandfa ther ' ' grandf ather ' ' grand father (list it) ' ' ... Again, "grand" doesn't split, ' ' so now we try to make a new word ' ' with "father" at the end: ' ' gran dfather ' ' gra ndfather ' ' gr andfather ' ' g randfather ' ' grandfather (list it) ' ' Then go back to "clock" to see if ' ' it can be part of some larger word:' ' grandfathe rclock ' ' grandfath erclock ' ' grandfat herclock ' ' grandfa therclock ' ' grandf atherclock ' ' grand fatherclock ' ' gran dfatherclock ' ' gra ndfatherclock ' ' gr andfatherclock ' ' g randfatherclock ' ' We are now back to the original ' ' word, so the search is done. ' ' We have found 3 sets of run-ons: ' ' grand fat her clock ' ' grand father clock ' ' grandfather clock ' ' ' ' Note that each of the tests above ' ' is done by recursively calling ' ' the FindRunOns subroutine (below). ' ' ' ' The routine above does not find ' ' words in which an apostrophe or ' ' hyphen has been dropped, such as ' ' "sixoclock" or "runonwords". ' ' This test can be done by adding an ' ' apostrophe and then a hyphen at ' ' each split point and testing again.' ' splitWords = good word(s) on the ' ' right of the original run-on text ' ' passed to this routine recursively.' ' This routine only operates on the ' ' the passed variable: runonWords ' ' ' ' splitoffWrd = words which have ' ' already tested good, like "clock". ' ' ' ' leftSide = the letters to the ' ' left of the splitoffWrd. ' ' ' ' Example: "grandfatherclock" ' ' When splitoffWrd = "clock" then ' ' leftSide = "grandfather". Call ' ' FindRunOns leftSide, splitoffWrd ' ' which this routine receives as the ' ' variables runonWords & splitWords ' s = "" If Len(leftSide) = 1 Then If InStr("aAI", leftSide) = 0 Then rank = 9 '9999 End If If rank < 4 Or IsNull(rank) Then ' ranking of splitoffWord ' ' Use only "common", "less common" and "rare" words. ' WordsRS.Seek "=", leftSide If Not WordsRS.NoMatch Then If WordsRS!FreqUsed < "4" Or IsNull(WordsRS!FreqUsed) Then CortexRS.Index = "WordID" CortexRS.Seek "=", WordsRS!id If CortexRS.NoMatch And Right$(leftSide, 1) = "s" Then ' Simple plurals (adding "s") were not originally not ' ' entered in Cortex, so look for the singular form: ' ' (I'm currently entering them as I enter definitions.) WordsRS.Seek "=", Left$(leftSide, Len(leftSide) - 1) If Not WordsRS.NoMatch Then CortexRS.Seek "=", WordsRS!id End If End If If Not CortexRS.NoMatch Then ' Bingo!! When leftSide is a good word, ' ' add it and the previously split words ' ' on its right to the suggestions list: ' lstSuggestions.AddItem RTrim$(leftSide & " " & splitoffWrd & " " & splitWords) End If End If End If WordsRS.Seek "=", leftSide & "-" & splitoffWrd If Not WordsRS.NoMatch And splitWords <> "" Then ' A hyphenated word was found: runonwords = run-on words ' ' but don't do just runon = run-on because already listed. lstSuggestions.AddItem leftSide & "-" & splitoffWrd & " " & splitWords End If If splitPoint > 2 Then int_Nested = int_Nested + 1 FindRunOns leftSide, Trim$(splitoffWrd & " " & splitWords) ' In addition to passing the word split off in this recursion, ' ' we also pass the splitWords passed to this recursion, if any.' ' Example: input="runonwords", ' ' leftSide="run", splitoffWrd="on" & " " & splitWords="words" ' ElseIf InStr("aAI", leftSide) > 0 Then s = leftSide Else WordsRS.Seek "=", runOnWords If Not WordsRS.NoMatch Then ' If no good split has been found, ' ' then if runOnWords is a good word, ' ' return it: ' splitWords = runonWrds End If End If If s <> "" Then ' Left part is made up of good ' ' words passed back recursively.' splitWords = s & " " & splitoffWrd Else splitPoint = splitPoint - 1 End If Else splitPoint = splitPoint - 1 End If Else splitPoint = splitPoint - 1 End If Loop While splitPoint > 2 If int_Nested > 0 Then int_Nested = int_Nested - 1 End Sub Sub FindAffixes(runOnWords As String) Dim s As String Dim w As String Dim x As String ' No valid split found. Check for suffixes added onto good words: ' Dim w0, w1, w2, w3, w4 As String ' Someone being clever could make a verb ' ' by adding "ify" to an adjective (or to ' ' any word, really), such as the real ' ' word beautify and even uglify (note ' ' the "y" dropped). ' w0 = Right$(" " & runOnWords, 2) ' -fy ' w2 = Right$(" " & runOnWords, 3) ' -ify ' w3 = Right$(" " & runOnWords, 5) ' -ifies ' w4 = Right$(" " & runOnWords, 9) ' -ification ' If w0 = "fy" Or w2 = "ify" Or _ w3 = "ifies" Or _ w4 = "ification" _ Then If w2 = "ify" Then w = Left$(runOnWords, Len(runOnWords) - 3) ElseIf w0 = "fy" Then w = Left$(runOnWords, Len(runOnWords) - 2) ElseIf w3 = "ifies" Then w = Left$(runOnWords, Len(runOnWords) - 5) ElseIf w4 = "ification" Then w = Left$(runOnWords, Len(runOnWords) - 9) End If WordsRS.Index = "Text" w1 = w WordsRS.Seek "=", w If WordsRS.NoMatch Then w1 = w & "a" WordsRS.Seek "=", w1 If WordsRS.NoMatch Then w1 = w & "e" WordsRS.Seek "=", w1 If WordsRS.NoMatch Then w1 = w & "i" WordsRS.Seek "=", w1 If WordsRS.NoMatch Then w1 = w & "o" WordsRS.Seek "=", w1 If WordsRS.NoMatch Then w1 = w & "u" ' unlikely, but may as well try ' WordsRS.Seek "=", w1 If WordsRS.NoMatch Then w1 = w & "y" WordsRS.Seek "=", w1 End If End If End If End If End If End If If Not (WordsRS.NoMatch) Then MsgBox "'" & runOnWords & "' may be a verb form" & Chr$(13) & _ "(possibly made up) of '" & w1 & "'." lstSuggestions.AddItem "= do 'x' with a " & w1 Else ' Didn't find a word with an 'fy' type of ending. ' ' Take off the ending and run FindAffixes again ' ' in case a prefix was also added. ' ' Example: "de-battery-fy" = "undo the battery" ' ' In this example, the -fy is irrelevant, but if ' ' another "word" needs it, add code before doing ' ' the Exit Sub. ' bln_SkipIt = True str_AffixWord = w FindAffixes str_AffixWord Exit Sub End If End If If Right$(" " & runOnWords, 3) = "ity" Then WordsRS.Index = "Text" s = Left$(runOnWords, Len(runOnWords) - 3) ' idiomatic-ity < idiomatic ' WordsRS.Seek "=", s If WordsRS.NoMatch Then If Right$(runOnWords, 5) = "ality" Then ' actu-ality < actual ' s = Left$(runOnWords, Len(runOnWords) - 3) ElseIf Right$(runOnWords, 5) = "guity" Then ' ambigu-ity < ambigu-ous ' s = Left$(runOnWords, Len(runOnWords) - 3) & "ous" ElseIf Right$(runOnWords, 6) = "bility" Then ' A lot of words end in "-bility" from "-able" or "-ible". ' s = Left$(runOnWords, Len(runOnWords) - 5) & "le" ' dur*ab-ility < dur*ab-le ' ' ed*ib-ility < ed*ib-le ' ElseIf Right$(runOnWords, 5) = "ility" Then s = Left$(runOnWords, Len(runOnWords) - 3) & "e" ' ag*il-ity < agil-e ' ElseIf Right$(runOnWords, 5) = "nuity" Then ' Only the four following words end in "-nuity": ' s = Left$(runOnWords, Len(runOnWords) - 3) & "al" ' an*nu-ity < an*nu-al ' ' conti*nu-ity < conti*nu-al ' WordsRS.Seek "=", s If WordsRS.NoMatch Then s = Left$(runOnWords, Len(runOnWords) - 3) & "ous" ' te*nu-ity < te*nu-ous ' WordsRS.Seek "=", s If WordsRS.NoMatch Then s = Left$(runOnWords, Len(runOnWords) - 4) & "ious" ' inge*n-uity < inge*n-ious ' End If End If ElseIf Right$(runOnWords, 4) = "sity" Then ' pompo-sity < pompo-us ' ' Possible made-up words: glorio-sity (glorious), furio-sity (furious) ' s = Left$(runOnWords, Len(runOnWords) - 4) & "us" ElseIf Right$(runOnWords, 5) = "tuity" Then ' Only 3 convertible words end with "tuity" ' ' and they each have different conversions: ' s = Left$(runOnWords, Len(runOnWords) - 3) & "al" ' perpe*tu-ity < perpetu-al ' WordsRS.Seek "=", s If WordsRS.NoMatch Then s = Left$(runOnWords, Len(runOnWords) - 3) & "ous" ' fa*tu-ity < fatu-ous ' WordsRS.Seek "=", s If WordsRS.NoMatch Then s = Left$(runOnWords, Len(runOnWords) - 3) & "itous" ' for*tu-ity < fortu-itous ' End If End If End If WordsRS.Seek "=", s & "ness" If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' is not in the Words table." & Chr$(13) & _ "The correct form may be: '" & s & "ness'." lstSuggestions.AddItem "= " & s & "ness" Exit Sub End If WordsRS.Seek "=", s End If If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' may be a noun form" & Chr$(13) & _ "(possibly made up) of '" & s & "'." Else Stop End If ElseIf Right$(" " & runOnWords, 4) = "able" Or Right$(" " & runOnWords, 4) = "ible" Then WordsRS.Index = "Text" s = Left$(runOnWords, Len(runOnWords) - 4) ' alert(able), access(ible) ' WordsRS.Seek "=", s If WordsRS.NoMatch Then ' If a match, then a good word was found (obviously). ' s = s & "e" WordsRS.Seek "=", s End If x = "" If WordsRS.NoMatch Then s = ChopRight(s, 2) x = Right$(s, 1) WordsRS.Seek "=", s End If If Not WordsRS.NoMatch Then Dim vbWID As Long vbWID = WordsRS!id CortexRS.Index = "WordID" CortexRS.Seek "=", vbWID WordsRS.Index = "ID" WordsRS.Seek "=", CortexRS!WordID Do While WordsRS!Class <> "verb" And CortexRS!WordID = vbWID CortexRS.MoveNext If CortexRS.EOF Then Stop Loop If WordsRS!Class <> "verb" Or CortexRS!WordID <> vbWID Then ' not a verb Else MsgBox "'" & runOnWords & "' may be a word (possibly made up) meaning" & Chr$(13) & _ " 'Something which someone can " & s & ".'" lstSuggestions.AddItem "= " & s & x & "able" End If Else Stop End If ElseIf Right$(" " & runOnWords, 7) = "iferous" Then ' Most existing words with "iferous" suffixes are ' ' technical words (culmiferous, papuliferous) but ' ' presumably, made-up words would tack "iferous" ' ' onto an easily recognizable word. ' WordsRS.Index = "Text" s = ChopRight(runOnWords, 7) ' e.g.: pest(iferous) ' WordsRS.Seek "=", s If WordsRS.NoMatch Then w = s & "e" WordsRS.Seek "=", w End If If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' may be a made up " & Chr$(13) & _ "word meaning 'having " & s & "'." lstSuggestions.AddItem "= having " & s End If ElseIf Right$(" " & runOnWords, 3) = "ism" Then WordsRS.Index = "Text" s = ChopRight(runOnWords, 3) ' ' WordsRS.Seek "=", s w = s If WordsRS.NoMatch Then w = s & "e" ' ' WordsRS.Seek "=", w End If If WordsRS.NoMatch Then w = s & "y" ' ' WordsRS.Seek "=", w End If If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' may be a made up word" & Chr$(13) & _ "meaning 'strongly pro-/anti- " & w & "'." lstSuggestions.AddItem "= strongly pro-/anti- " & w End If ElseIf Right$(" " & runOnWords, 3) = "ite" Then WordsRS.Index = "Text" s = ChopRight(runOnWords, 3) ' ' WordsRS.Seek "=", s w = s If WordsRS.NoMatch Then w = s & "e" ' ' WordsRS.Seek "=", w End If If WordsRS.NoMatch Then w = s & "y" ' ' WordsRS.Seek "=", w End If If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' may be a made up word" & Chr$(13) & _ "meaning 'a follower/part of " & w & "'." lstSuggestions.AddItem "= follower/part of " & w Else Stop End If ElseIf Right$(" " & runOnWords, 3) = "ize" Then WordsRS.Index = "Text" s = ChopRight(runOnWords, 3) ' actual(ize) ' WordsRS.Seek "=", s If WordsRS.NoMatch Then w = s & "e" ' acclimat(ize) + e ' WordsRS.Seek "=", w End If If WordsRS.NoMatch Then w = s & "y" ' agon(ize) + y ' WordsRS.Seek "=", w End If If WordsRS.NoMatch Then w = ChopRight(s, 1) & "tic" ' synthe(s)(ize) + tic ' WordsRS.Seek "=", w End If If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' may be a verb form" & Chr$(13) & _ "(possibly made up) of '" & s & "'." lstSuggestions.AddItem "= make/become " & s Else Stop End If ElseIf Right$(" " & runOnWords, 4) = "less" Then WordsRS.Index = "Text" s = Left$(runOnWords, Len(runOnWords) - 4) ' leash-less ' WordsRS.Seek "=", s If WordsRS.NoMatch And Right$(runOnWords, 5) = "iless" Then s = ChopRight(s, 1) & "y" WordsRS.Seek "=", s If WordsRS.NoMatch Then s = ChopRight(s, 1) & "ey" WordsRS.Seek "=", s End If End If If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' may be a noun form" & Chr$(13) & _ "(possibly made up) meaning:" & Chr$(13) & _ " 'without a " & s & "'." lstSuggestions.AddItem "= " & s & "less" End If ElseIf Right$(" " & runOnWords, 4) = "logy" Then ' usually -ology ' WordsRS.Index = "Text" If Right$(runOnWords, 5) = "ology" Then s = ChopRight(runOnWords, 5) ' techn(ology) ' Else s = ChopRight(runOnWords, 4) ' genea(logy) ' End If WordsRS.Seek "=", s ' Made-up words usually have a full word ' ' with "ology" on the end, but we can ' ' look for other words if a full word is ' ' not found. If len(w) is <4, we would ' ' get too many matches, so keep it >3. ' If WordsRS.NoMatch And Len(w) > 3 Then WordsRS.Seek ">=", w End If If Not WordsRS.NoMatch Then If Len(WordsRS!Text) - Len(s) < 3 Then MsgBox "'" & runOnWords & "' may be a made-up word" & Chr$(13) & _ "meaning the study of '" & WordsRS!Text & "'." lstSuggestions.AddItem "= study of " & s End If End If ElseIf Right$(" " & runOnWords, 4) = "ment" Then WordsRS.Index = "Text" s = Left$(runOnWords, Len(runOnWords) - 4) ' wisdom-ment ' WordsRS.Seek "=", s If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' may be a (made-up) noun form" & Chr$(13) & _ "meaning the state or condition of '" & s & "'." lstSuggestions.AddItem "= " & s & "-ment" End If Exit Sub ElseIf Right$(" " & runOnWords, 4) = "ness" Then WordsRS.Index = "Text" s = Left$(runOnWords, Len(runOnWords) - 4) ' alert-able ' WordsRS.Seek "=", s If WordsRS.NoMatch And Right$(runOnWords, 5) = "iness" Then s = ChopRight(s, 1) & "y" WordsRS.Seek "=", s If WordsRS.NoMatch Then s = ChopRight(s, 1) & "ey" WordsRS.Seek "=", s End If End If If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' may be a noun form" & Chr$(13) & _ "(possibly made up) of '" & s & "'." lstSuggestions.AddItem "= " & s & "-ness" End If ElseIf Right$(" " & runOnWords, 3) = "ous" Then ' forms adjectives which have the general ' ' sense of "possessing" or "full of". ' ' glory, glorious; nerve, nervous; wonder,' ' wonderous, but not all words ending in ' ' "ous" are like this; e.g.: prodigious ' ' -acious and -icious = "full of ...". ' ' A similar suffix is "licious", as in ' ' bootylicious, babelicious, etc., though ' ' it is not very widely used since it has ' ' a rather narrow meaning (delicious) and ' ' there is only so many ways to say that ' ' a babe is "delicious". ' w = runOnWords WordsRS.Index = "Text" If Right$(" " & w, 7) = "licious" Then s = ChopRight(w, 7) ' e.g.: burger(licious) ' WordsRS.Seek "=", s If Not WordsRS.NoMatch Then MsgBox "'" & w & "' may be a made-up word" & Chr$(13) & _ "meaning a 'delicious " & s & "'." lstSuggestions.AddItem "= delicious " & s Exit Sub End If End If If Right$(" " & w, 6) = "acious" Or Right$(" " & w, 6) = "icious" Then s = ChopRight(w, 6) ' aud(acious), avar(icious) ' ' Made-up words with these endings seems unlikely. ' ' While the suffix has a clear meaning (full of), ' ' the root parts of real words are not whole words.' ' But it doesn't cost much to leave this code in. ' ' Also, when people make up words, it seems that ' ' they would be more likely to use a whole word as ' ' the root so that it could be more recognizable. ' WordsRS.Seek "=", s If Not WordsRS.NoMatch Then MsgBox "'" & w & "' may be a made-up word" & Chr$(13) & _ "meaning 'full of " & s & "'." lstSuggestions.AddItem "= full of " & s Exit Sub End If End If If Right$(" " & w, 3) = "ous" Then s = ChopRight(w, 3) WordsRS.Seek "=", s If Not WordsRS.NoMatch Then MsgBox "'" & w & "' may be a made-up word" & Chr$(13) & _ "meaning 'full of " & s & "'." lstSuggestions.AddItem "= " & s & "-ness" Exit Sub End If End If ElseIf Right$(" " & runOnWords, 7) = "-lutely" Then ' Catch abso-"x"-lutely where x is an embedded word. " ' Example: abso-bloomin'-lutely, abso-friggin'-lutely " If Left$(runOnWords, 5) = "abso-" Then s = Mid$(runOnWords, 6, Len(runOnWords) - 12) MsgBox "'" & runOnWords & "' is 'absolutely' with " & Chr$(13) & _ " '" & s & "' embedded in it." lstSuggestions.AddItem "= absolutely" End If ElseIf Left$(runOnWords & " ", 4) = "auto" Then WordsRS.Index = "Text" s = Mid$(runOnWords, 5) ' (auto)magically ' WordsRS.Seek "=", s If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' may be a made-up word" & Chr$(13) & _ "meaning that '" & s & "' is automatic." lstSuggestions.AddItem "= auto-" & s End If ElseIf Left$(runOnWords & " ", 2) = "de" Then WordsRS.Index = "Text" s = Mid$(runOnWords, 3) ' (de)classify ' WordsRS.Seek "=", s If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' may be a made-up word" & Chr$(13) & _ "meaning the same as 'undo the " & s & "'." lstSuggestions.AddItem "= undo the " & s End If ElseIf Left$(runOnWords & " ", 3) = "non" Then WordsRS.Index = "Text" s = Mid$(runOnWords, 4) ' (non)refundable ' WordsRS.Seek "=", s If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' may be a made-up word" & Chr$(13) & _ "meaning the same as 'not " & s & "'." lstSuggestions.AddItem "= not " & s End If ElseIf Left$(runOnWords & " ", 3) = "sub" Then WordsRS.Index = "Text" s = Mid$(runOnWords, 4) ' subparticle, particle ' WordsRS.Seek "=", s If Not WordsRS.NoMatch Then lstSuggestions.AddItem "= part of/below " & s End If ElseIf Right$(" " & runOnWords, 4) = "wise" Then WordsRS.Index = "Text" s = ChopRight$(runOnWords, 4) ' length(wise) ' WordsRS.Seek "=", s If Not WordsRS.NoMatch Then MsgBox "'" & runOnWords & "' may be a made-up word" & Chr$(13) & _ "meaning 'in the manner of (a) " & s & "'." lstSuggestions.AddItem "= in the manner of [a] " & s End If End If End Sub Private Sub FindBlends(blended As String) ' A blended word (also known as "portmanteaus") is ' ' a single word formed by combining the start of one ' ' word with the last part of another. Examples: ' ' "brunch" = br(eakfast) + (l)unch ' ' "Spanglish" = Span(ish) + (En)glish ' ' "urinalysis" = urin(e) + (an)alysis ' ' Example of a 3-word blend: ' ' "turducken" = tur(key) + du(ck) + (chi)cken ' ' (chicken stuffed into duck stuffed into turkey.) ' ' Many blended words are in the dictionary, but ' ' their nature is that people make them up frequently. ' For example, "smog" (smoke+fog) is well established, ' but "smaze" (smoke+haze) is rarely (if ever) seen, ' ' and if used, would probably not be understood, so ' ' using "smaze" instead of just saying "smoke and ' ' haze" is just an affectation. ' ' An episode of "How I Met Your Mother" featured: ' ' "turturkeykey", a turkey stuffed inside a turkey, ' ' but this is not a blended word. It is a word inside' ' a word, like "abso-friggin'-lutely". ' ' To search for blends of two words, we start with ' ' the first two letters of the blended word in word1 ' ' and the rest in word2. We look in the Words table ' ' Text field for words starting with word1 letters ' ' and in the Backwards field for words ending with ' ' the word2 letters. ' ' A blended word could have many possible matches. ' ' "brunch" could be break+lunch, brow+hunch, etc. ' ' Over 1000 words in AI-C start with "br" and about ' ' 20 end with "unch". That's a lot of possible word ' ' combinations. Even a blend like "affluenza", which ' ' is "af" with only one possible match, influenza, ' ' has to deal with a lot of words starting with "af".' ' On the flip side, it could be a word starting with ' ' "affluen", which leaves a lot of words ending with ' ' "za", or any combination inbetween, such as "aff" ' ' plus "luenza", "affl" plus "uenza", etc. ' ' The bottom line is that there are too many possible' ' results and no way to know which is correct for it ' ' to be worth looking for them. ' End Sub Public Function LevDist(ByVal wrd As String, _ ByVal sug As String) As Integer ' Called by "Sub GetSuggestions". ' ' Compute Levenshtein Distance. (From Wikipedia:) ' ' The Levenshtein distance between two strings is ' ' the minimum number of edits needed to transform ' ' one string into the other, with the allowable ' ' edit operations being insertion, deletion, or ' ' substitution of a single character. ' ' The Levenshtein distance between "kitten" and ' ' "sitting" is 3, since the following three edits ' ' are the minimum to change one into the other: ' ' 1.kitten > sitten (substitution of 's' for 'k') ' ' 2.sitten > sittin (substitution of 'i' for 'e') ' ' 3.sittin ? sitting (insert 'g' at the end). ' ' The smaller the LevDist, the more likely that ' ' sug is the correct replacement for wrd. ' Dim dist() As Integer ' Levenshtein matrix Dim sug_Len As Integer ' length of sug Dim wrd_Len As Integer ' length of wrd Dim wrd_i As Integer ' iterates through wrd Dim sug_i As Integer ' iterates through sug Dim wrd_Char As String ' ith character of wrd Dim sug_Char As String ' jth character of sug Dim chngs As Integer ' number of changes needed to make wrd = sug ' Dim cell As Integer Dim trans As Integer Dim min As Integer Dim min2 As Integer Dim w As String Dim i As Long Dim j As Long Dim k As Long ' Ignore case. If user entered "pasover", ' ' he would not be given "Passover". ' wrd = LCase$(wrd) sug = LCase$(sug) If InStr(sug, " ") Then LevDist = 1 Exit Function End If wrd_Len = Len(wrd) sug_Len = Len(sug) If wrd_Len = 0 Then LevDist = sug_Len Exit Function End If If sug_Len = 0 Then LevDist = wrd_Len Exit Function End If ReDim dist(0 To wrd_Len, 0 To sug_Len) As Integer For wrd_i = 0 To wrd_Len dist(wrd_i, 0) = wrd_i Next For sug_i = 0 To sug_Len dist(0, sug_i) = sug_i Next For wrd_i = 1 To wrd_Len wrd_Char = Mid$(wrd, wrd_i, 1) For sug_i = 1 To sug_Len sug_Char = Mid$(sug, sug_i, 1) If wrd_Char = sug_Char Then chngs = 0 Else chngs = 1 End If min = dist(wrd_i - 1, sug_i) + 1 If dist(wrd_i, sug_i - 1) + 1 < min Then min = dist(wrd_i, sug_i - 1) + 1 End If If dist(wrd_i - 1, sug_i - 1) + chngs < min Then min = dist(wrd_i - 1, sug_i - 1) + chngs End If dist(wrd_i, sug_i) = min ' Check for transposition of characters (teh = the): ' If wrd_i > 1 And sug_i > 1 Then If wrd_Char = Mid$(sug, sug_i - 1, 1) And _ sug_Char = Mid$(wrd, wrd_i - 1, 1) _ Then If dist(wrd_i, sug_i) >= dist(wrd_i - 2, sug_i - 2) + chngs Then dist(wrd_i, sug_i) = dist(wrd_i - 2, sug_i - 2) + chngs trans = 99 End If End If End If ' The following count the substitution of ' several letters. For example, if the user ' entered a word ending in "shun" instead ' of "tion", that should be a 1-change error ' instead of a 3-letter change because it's ' 1 error causing the change. ' Check for "f" in place of "ph": ' If wrd_Char = "f" And Mid$(sug, sug_i, 2) = "ph" Then w = LTrim$(Left$(" " & wrd, wrd_i)) & "ph" & _ RTrim$(Mid$(wrd & " ", wrd_i + 1)) If w = sug Then LevDist = 1 Exit Function End If End If ' "shun" in place of "tion" or "sion": ' If Mid$(wrd & " ", wrd_i, 4) = "shun" Then w = LTrim$(Left$(" " & wrd, wrd_i)) & "sion" & _ RTrim$(Mid$(wrd & " ", wrd_i + 4)) If w = sug Then LevDist = 1 Exit Function Else w = LTrim$(Left$(" " & wrd, wrd_i)) & "tion" & _ RTrim$(Mid$(wrd & " ", wrd_i + 4)) If w = sug Then LevDist = 1 Exit Function End If End If End If ' "kel" in place of "cle" (debacle): ' If Mid$(wrd & " ", wrd_i, 3) = "kel" Then w = LTrim$(Left$(" " & wrd, wrd_i)) & "cle" & _ RTrim$(Mid$(wrd & " ", wrd_i + 3)) If w = sug Then LevDist = 1 Exit Function End If End If ' "ite" in place of "ight" (lite, nite): ' If Mid$(wrd & " ", wrd_i, 3) = "ite" Then w = Replace$(wrd, "ite", "ight") If w = sug Then LevDist = 1 Exit Function End If End If Next Next ' Eye dialect edits, such as changing "ah" to ' ' "er", are counted as 2 edits; reduce to 1. ' If dist(wrd_Len, sug_Len) > 1 Then If Right$(wrd, 2) = "ah" And Right$(sug, 2) = "er" Then ' mistah - eye dialect for mister ' dist(wrd_Len, sug_Len) = dist(wrd_Len, sug_Len) - 1 End If i = InStr(wrd, "ee") w = wrd k = 0 Do While i > 0 j = Len(w) If i = 1 Then w = "i" & Mid$(w, 3) ' ees = is ' k = k + 1 ElseIf i = j - 2 Then ' can't think of any examples ' Exit Do Else ' meester = mister ' w = Left$(w, i - 1) & "i" & Mid$(w, i + 2) k = k + 1 End If i = InStr(w, "ee") Loop If w = sug Then dist(wrd_Len, sug_Len) = dist(wrd_Len, sug_Len) - k End If If InStr(wrd, "-n-") > 0 Then If Replace(wrd, "-n-", "-and-") = sug Then dist(wrd_Len, sug_Len) = dist(wrd_Len, sug_Len) - 1 End If End If End If LevDist = dist(wrd_Len, sug_Len) Erase dist End Function Private Function Minimum(ByVal a As Integer, _ ByVal b As Integer, _ ByVal c As Integer) As Integer ' Called by "Sub GetSuggestions". ' Dim mi As Integer mi = a If b < mi Then mi = b If c < mi Then mi = c Minimum = mi End Function Public Function nGram(w1 As String, w2 As String) As Integer ' Called by "Sub GetSuggestions". ' Dim gram(1, 50) As String Dim i As Integer Dim j As Integer Dim n As Integer Dim skipped As Integer If Abs(Len(w1) - Len(w2)) > 2 Then Exit Function End If For i = 1 To Len(w1) gram(0, i) = Mid$(w1, i, 2) Next For j = 1 To Len(w2) gram(1, j) = Mid$(w2, j, 2) Next n = 0 i = 1 j = 1 Do If gram(0, i) = gram(1, j) Then If skipped < 2 Then n = n + 1 End If If skipped > 0 Then skipped = skipped - 1 End If ElseIf gram(0, i + 1) = gram(1, j) Then n = n + 1 i = i + 1 If skipped = 0 And gram(0, i - 1) <> gram(1, j - 1) Then skipped = skipped + 2 Else skipped = skipped + 1 End If ElseIf gram(0, i) = gram(1, j + 1) Then n = n + 1 j = j + 1 If skipped = 0 And gram(0, i - 1) <> gram(1, j - 1) Then skipped = skipped + 2 Else skipped = skipped + 1 End If End If i = i + 1 j = j + 1 Loop While i <= Len(w1) And j <= Len(w2) nGram = n End Function Public Function qGram(w1 As String, w2 As String) As Integer ' Called by "Sub GetSuggestions". ' Dim gram(1, 50) As String Dim i As Integer Dim j As Integer Dim n As Integer Dim skipped As Integer If Abs(Len(w1) - Len(w2)) > 2 Then Exit Function End If For i = 1 To Len(w1) gram(0, i) = Mid$(w1, i, 1) Next For j = 1 To Len(w2) gram(1, j) = Mid$(w2, j, 1) Next i = 1 j = 1 n = 0 Do If gram(0, i) = gram(1, j) Then 'If 2+ letters skipped over to 'find a match, skip an equal 'number of matches: If skipped < 2 Then n = n + 1 End If If skipped > 0 Then skipped = skipped - 1 End If ElseIf gram(0, i + 1) = gram(1, j) Then n = n + 1 i = i + 1 skipped = skipped + 1 ElseIf gram(0, i) = gram(1, j + 1) Then n = n + 1 j = j + 1 skipped = skipped + 1 End If i = i + 1 j = j + 1 Loop While i <= Len(w1) And j <= Len(w2) qGram = n End Function Private Function Metaphone(w As String) As String Dim i As Long Dim c As String Dim c2 As String ' This routine makes adjustments to a word to ' ' make it be spelled more the way it sounds ' ' before calculating its Soundex. ' ' This appears to be intended more for names ' ' than for regular words. The Metaphone 2 ' ' algorithm is more precise, but again, it is ' ' intended for use in differentiating names. ' ' My experience is that there are no hard and ' ' fast rules for reliably computing the pro- ' ' nunuciation of regular words. Although most ' ' spelling errors in typed text result from ' ' typos, not from using words which sound like' ' other words. ' If InStr("ae gn kn pn wr ", Left$(w, 2)) > 0 Then w = Mid$(w, 2) If Right$(w, 2) = "mb" Then w = Left$(w, Len(w) - 1) w = Replace(w, "cia", "sha") w = Replace(w, "ci", "si") w = Replace(w, "ce", "se") w = Replace(w, "cy", "sy") w = Replace(w, "sci", "si") w = Replace(w, "sce", "se") w = Replace(w, "scy", "sy") w = Replace(w, "ch", "j") w = Replace(w, "c", "k") w = Replace(w, "kk", "k") ' line above changed ck to kk; second k silent in ck ' w = Replace(w, "ph", "f") ' okay for "telephone" but not for "flophouse". ' w = Replace(w, "q", "k") w = Replace(w, "sio", "sho") w = Replace(w, "sia", "sha") w = Replace(w, "tia", "sha") w = Replace(w, "tio", "sho") w = Replace(w, "tch", "j") w = Replace(w, "v", "f") i = InStr(2, w, "w") Do While i > 0 If InStr("aeiou", Mid$(w & " ", i + 1, 1)) = 0 Then ' w not followed by a vowel ' w = Left$(w, i - 1) & Mid$(w, i + 1) End If i = InStr(i + 2, w, "w") Loop w = Replace(w, "x", "ks") i = InStr(2, w, "y") Do While i > 0 If InStr("aeiou", Mid$(w & " ", i + 1, 1)) = 0 Then ' y not followed by a vowel ' w = Left$(w, i - 1) & Mid$(w, i + 1) End If i = InStr(i + 2, w, "y") Loop w = Replace(w, "z", "s") Metaphone = w End Function